Towards an immortal operating system in virtual environments

نویسندگان

  • Joefon Jann
  • R. Sarma Burugula
  • Ching-Farn E. Wu
  • Kaoutar El Maghraoui
چکیده

Many OS crashes are caused by bugs in kernel extensions or device drivers while the OS itself may have been tested rigorously. To make an OS immortal we must resurrect the OS from these crashes. We present a novel OS-hypervisor infrastructure that allows automated and transparent OS crash diagnosis and recovery in a virtual environment. This infrastructure eliminates the need for reboots or checkpoint-restart mechanisms, which require preserving the states of critical applications before the crash happens and also require extensive modifications to those applications. At the core of our approach is a small hidden OS-repair-image that is dynamically created from the healthy running OS instance. When an OS crashes, the hypervisor dynamically loads this repair-image to perform diagnosis and repair. One way of repair we have experimented with, is to quarantine the offending process and resume the running of the fixed OS automatically without a reboot. Experimental evaluations demonstrated that it takes less than 3 seconds to recover from an OS crash. This approach can significantly reduce the downtime and maintenance costs in data centers, and is the first design and implementation of an OS-hypervisor combo capable of automatically resurrecting a crashed commercial server-OS. In addition to online diagnosis and recovery, this infrastructure can also be used for offline diagnosis and can be incorporated into the technical support tools of the OS vendor. Additionally, we have used parts of this infrastructure to speed-up the diagnosis of AIX OS-crashes for the IBM technical support teams.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Live Migration of User Environments Across Wide Area Networks

A complex challenge in mobile computing is to allow the user to migrate her highly customised environment while moving to a different location and to continue work without interruption. I motivate why this is a highly desirable capability and conduct a survey of the current approaches towards this goal and explain their limitations. I then propose a new architecture to support user mobility by ...

متن کامل

Surgical Technology Students' Attitudes Towards Virtual Education and Related Factors in Covid-19 Pandemic in Mazandaran and Mashhad Universities of Medical Sciences, 2021

Surgical Technology Students' Attitudes Towards Virtual Education and Related Factors in Covid-19 Pandemic in Mazandaran and Mashhad Universities of Medical Sciences, 2021

متن کامل

A Multi-objective Optimization Model for Dynamic Virtual Cellular Manufacturing Systems

Companies and firms, nowadays, due to mounting competition and product diversity, seek to apply virtual cellular manufacturing systems to reduce production costs and improve quality of the products. In addition, as a result of rapid advancement of technology and the reduction of product life cycle, production systems have turned towards dynamic production environments. Dynamic cellular manufact...

متن کامل

Towards a High Integrity Real-Time Java Virtual Machine

This paper defines a run-time architecture for a Java Virtual Machine (JVM) that supports the Ravenscar-Java profile (RJVM). This architecture introduces an early class loading and verifying model that can facilitate the predictable efficient execution of Java applications, detect program errors at the initialization phase and prevent errors occurring during the mission phase. A pre-emptive fix...

متن کامل

Towards Scalable Multiprocessor Virtual Machines

A multiprocessor virtual machine benefits its guest operating system in supporting scalable job throughput and request latency—useful properties in server consolidation where servers require several of the system processors for steady state or to handle load bursts. Typical operating systems, optimized for multiprocessor systems in their use of spin-locks for critical sections, can defeat flexi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Parallel Computing

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2014